A Cluster Architecture for Parallel Data Warehousing

نویسندگان

  • Frank Dehne
  • Todd Eavis
  • Andrew Rau-Chaplin
چکیده

We describe the parallel, cluster-based implementation of an algorithm for the computation of a database operator known as the datacube. Though a number of efficient sequential algorithms have recently been proposed for this problem, very little research effort has been expended upon cost-effective parallelization techniques. Our approach builds directly upon the existing sequential proposals and is designed to be both load balanced and communication efficient. We also provide experimental results that demonstrate the viability of our technique under a variety of test conditions. Ultimately, we show that parallel performance relative to the underlying sequential algorithm (speedup) is near optimal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Architecture of a Highly Scalable Data Warehouse Appliance Integrated to Mainframe Database Systems

Main memory processing and data compression are valuable techniques to address the new challenges of data warehousing regarding scalability, large data volumes, near realtime response times, and the tight connection to OLTP. The IBM Smart Analytics Optimizer (ISAOPT) is a data warehouse appliance that implements a main memory database system for OLAP workloads using a cluster-based architecture...

متن کامل

A Survey of Parallel and Distributed Data Warehouses

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the...

متن کامل

DataGarage: Warehousing Massive Amounts of Performance Data on Commodity Servers

Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the comple...

متن کامل

Aggregate-Based Query Processing in a Parallel Data Warehouse Server

In the last years data warehousing has emerged as a fundamental database technology providing the basis for online analytical processing (OLAP). In general, analytical queries involve aggregations of large data sets. This results in serious performance problems if ad-hoc queries are to be answered on-line. One method to avoid performance bottlenecks is to use parallel hardware, i.e. SMP or MPP ...

متن کامل

DataGarage: Warehousing Massive Performance Data on Commodity Servers

Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the comple...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001